[CORE-7061] sr/store: Deep copy schema_def iobufs at the interface #23114

oleiman · 2024-08-28T18:57:57Z

sharded_store's main purpose is to supply a consistent interface to an underlying (sharded) in-memory store. So to process a given request, it must first work out the home shard for the desired schema ID or subject, dispatching requests to that shard.

Recently, schema registry was refactored to store schema definitions as iobufs (rather than contiguous strings) to avoid large allocations. To avoid semantically unnecessary copies, get_schema_definition was implemented to return a iobuf share across a shard boundary. This is an unsafe operation, as the reference counting and deletion machinery of the underlying temporary buffers is totally unsynchronized. This is UB in principle (data race) and leads to memory access faults (UAF, etc.) in practice.

So this commit changes 'store::get_schema_definition' to return a copy of the stored schema definition (iobuf), eliminating any cross-shard sharing of the underlying temp buffers. This approach is rather blunt and carries some cost at deallocation time (seastar allocator must work out the appropriate site for memory reclamation for each component temp buffer of a given schema def), but it should be safe since the component temp buffers are never live on multiple shards simultaneously.

Note that this path could be optimized somewhat by returning a shared iobuf managed by foreign_ptr and carrying out the necessary copy (a read-only operation) on the requesting shard.

This commit also includes a "unit test" that is more of a memory access canary, of sorts. Reverting the change in 'store::gets_schema_definition' will cause the test to fail reliably, producing memory faults in line with the reasoning above. With the change in place, this test reliably runs to completion without issue.

Backports Required

Release Notes

Bug Fixes

Fixes a regression in schema registry where concurrent requests could lead to memory faults

sharded_store's main purpose is to supply a consistent interface to an underlying (sharded) in-memory store. So to process a given request, it must first work out the home shard for the desired schema ID or subject, dispatching requests to that shard. Recently, schema registry was refactored to store schema definitions as iobufs (rather than contiguous strings) to avoid large allocations. To avoid semantically unnecessary _copies_, get_schema_definition was implemented to return a iobuf share across a shard boundary. This is an unsafe operation, as the reference counting and deletion machinery of the underlying temporary buffers is totally unsynchronized. This is UB in principle (data race) and leads to memory access faults (UAF, etc.) in practice. So this commit changes 'store::get_schema_definition' to return a copy of the stored schema definition (iobuf), eliminating any cross-shard sharing of the underlying temp buffers. This approach is rather blunt and carries some cost at deallocation time (seastar allocator must work out the appropriate site for memory reclamation for each component temp buffer of a given schema def), but it should be safe since the component temp buffers are never live on multiple shards simultaneously. Note that this path could be optimized somewhat by returning a shared iobuf managed by foreign_ptr and carrying out the necessary copy (a read-only operation) on the requesting shard. This commit also includes a "unit test" that is more of a memory access canary, of sorts. Reverting the change in 'store::gets_schema_definition' will cause the test to fail reliably, producing memory faults in line with the reasoning above. With the change in place, this test reliably runs to completion without issue. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>

oleiman · 2024-08-28T19:01:29Z

@ballard26 - talked this over with @BenPope a bit, and we came to the conclusion that returning an iobuf copy across the shard boundary should be safe since it's temporaries all the way out to the call site and no data is shared.

It's a pessimization, and I assume we could do something w/ foreign_ptr to push the copy operation out to the call site. But it doesn't seem strictly necessary here since the path isn't particularly hot. WDYT? I get the impression that the seastar allocator should do the right thing when it tears down the copy, but I don't exactly know.

ballard26 · 2024-08-28T20:28:06Z

@ballard26 - talked this over with @BenPope a bit, and we came to the conclusion that returning an iobuf copy across the shard boundary should be safe since it's temporaries all the way out to the call site and no data is shared.

It's a pessimization, and I assume we could do something w/ foreign_ptr to push the copy operation out to the call site. But it doesn't seem strictly necessary here since the path isn't particularly hot. WDYT? I get the impression that the seastar allocator should do the right thing when it tears down the copy, but I don't exactly know.

I agree that using a foreign_ptr isn't needed here. It's done in the fetch path as an optimization to defer the copy since we don't always include the read batches in the fetch reply. And in those cases deferring it allows the fetch to avoid the copy altogether.

The seastar allocator will also be able to recognize that the memory being free'd doesn't belong to the current shard and will return it back to the proper shard so no worries there.

The only real concern is whether two shards can hold references to one of the underlying temporary_buffers in parallel. Since I think that's the cause for most if not all the race conditions. As you said since the copied iobuf is never shared on the original shard it should be safe to move it to the foreign shard and destroy it there.

One gotcha may be that in debug mode we restrict certain iobuf operations to the shard it was originally allocated on. So if a dev tries to use any of those operations on the copied iobuf they may find debug-mode tests failing. That isn't a correctness issue though and easily discovered.

BenPope

Awesome

BenPope · 2024-08-29T09:31:37Z

src/v/pandaproxy/schema_registry/test/mt_sharded_store.cc

+    // Upsert a large(ish) number of schemas to the store, all with different
+    // subject names and IDs, so they should hash to different shards.


The same schema should end up with the same ID, so the comment is a little confusing.

In real operation, yeah, but I thought that disambiguation happened somewhere else, like on the command apply path? it looks like when I directly upsert, sharded store is happy to shove an (id, def) pair into the schema map in memory on shard hash(id) as requested. I might be missing/misreading something.

vbotbuildovich · 2024-08-29T12:12:46Z

/backport v24.2.x

vbotbuildovich · 2024-08-29T12:12:47Z

/backport v24.1.x

vbotbuildovich · 2024-08-29T12:12:47Z

/backport v23.3.x

oleiman requested a review from BenPope August 28, 2024 18:57

oleiman self-assigned this Aug 28, 2024

github-actions bot added the area/redpanda label Aug 28, 2024

oleiman requested a review from ballard26 August 28, 2024 18:58

oleiman requested a review from michael-redpanda August 28, 2024 19:12

BenPope approved these changes Aug 29, 2024

View reviewed changes

michael-redpanda approved these changes Aug 29, 2024

View reviewed changes

michael-redpanda merged commit 02cb28b into redpanda-data:dev Aug 29, 2024
20 checks passed

This comment was marked as resolved.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE-7061] sr/store: Deep copy schema_def iobufs at the interface #23114

[CORE-7061] sr/store: Deep copy schema_def iobufs at the interface #23114

oleiman commented Aug 28, 2024 •

edited

Loading

oleiman commented Aug 28, 2024

ballard26 commented Aug 28, 2024

BenPope left a comment

BenPope Aug 29, 2024

oleiman Aug 29, 2024 •

edited

Loading

vbotbuildovich commented Aug 29, 2024

vbotbuildovich commented Aug 29, 2024

vbotbuildovich commented Aug 29, 2024

This comment was marked as resolved.

		// Upsert a large(ish) number of schemas to the store, all with different
		// subject names and IDs, so they should hash to different shards.

[CORE-7061] sr/store: Deep copy schema_def iobufs at the interface #23114

[CORE-7061] sr/store: Deep copy schema_def iobufs at the interface #23114

Conversation

oleiman commented Aug 28, 2024 • edited Loading

Backports Required

Release Notes

Bug Fixes

oleiman commented Aug 28, 2024

ballard26 commented Aug 28, 2024

BenPope left a comment

Choose a reason for hiding this comment

BenPope Aug 29, 2024

Choose a reason for hiding this comment

oleiman Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

vbotbuildovich commented Aug 29, 2024

vbotbuildovich commented Aug 29, 2024

vbotbuildovich commented Aug 29, 2024

This comment was marked as resolved.

oleiman commented Aug 28, 2024 •

edited

Loading

oleiman Aug 29, 2024 •

edited

Loading